The Bellman Equation in (2.7) is in an elementwise form. Since it is valid for every state, we can combine all these equations and write them concisely in a matrix- vector form, which will be frequently used to analyze the Bellman equation.
To derive the matrix- vector form, we first rewrite the Bellman equation in (2.7) as
denotes the mean of the immediate rewards,
is the probability of transitioning from to under policy
Suppose that the states are indexed as with , where . For state , (2.8) can be written as
Let , , and with . Then, (2.9) can be written in the following matrix- vector form:
where is the unknown to be solved, and are known.
The matrix has some interesting properties.
First, it is a nonnegative matrix, meaning that all its elements are equal to or greater than zero. This property is denoted as , where 0 denotes a zero matrix with appropriate dimensions. In this book, or represents an elementwise comparison operation.
Second, is a stochastic matrix, meaning that the sum of the values in every row is equal to one. This property is denoted as , where has appropriate dimensions.
Consider the example shown in Figure 2.6. The matrix- vector form of the Bellman equation is
Substituting the specific values into the above equation gives
It can be seen that satisfies .
Figure 2.6: An example for demonstrating the matrix-vector form of the Bellman equation.